Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify that Reader::read doesn't return 0. #18289

Closed
wants to merge 1 commit into from
Closed

Specify that Reader::read doesn't return 0. #18289

wants to merge 1 commit into from

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Oct 24, 2014

Reader is a trait for byte-oriented streams. Unlike with packet oriented
protocols such as UDP or lower level network traffic, there is no reason
for a read on a stream to ever return a zero bytes buffer.

If data is available, read shall return a suitable amount of it in the
buffer. If no data is available, read shall either block or signal a
condition via an Error.

Note that it is already impossible to use many Reader methods on packet
oriented protocols. E.g., consider reading an u64.

  1. If a new packet is available, what should be done with the remainder
    of the packet?
  2. If we've stored a previous packet and there are still four bytes
    available, should we a) attempt to read another packet and possibly
    block and then combine the bytes into an 8 byte sequence, or b) return
    an error?

Objects that implement packet oriented protocols or for which it makes
sense to return 0-sized reads should implement read methods that are
independent of the Reader trait.

This is a breaking change because there might be Reader implementations
that return 0, however, no such implementations are in the standard
library.

[breaking-change]


Note that the last paragraph is AFAIK true after #18130

Closes #18079

@pcwalton
Copy link
Contributor

r? @aturon @alexcrichton

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 24, 2014

Once this has been merged, some Reader functions can be simplified. E.g., read_u8 currently calls a large function (read_at_least) which then repeatedly calls read (up to 1000 times if read returns 0.) This might be the reason that read_u8 is significantly slower than read(buf) where buf has size 1.

Reader is a trait for byte-oriented streams. Unlike with packet oriented
protocols such as UDP or lower level network traffic, there is no reason
for a read on a stream to ever return a zero bytes buffer.

If data is available, read shall return a suitable amount of it in the
buffer. If no data is available, read shall either block or signal a
condition via an Error.

Note that it is already impossible to use many Reader methods on packet
oriented protocols. E.g., consider reading an u64.
1) If a new packet is available, what should be done with the remainder
of the packet?
2) If we've stored a previous packet and there are still four bytes
available, should we a) attempt to read another packet and possibly
block and then combine the bytes into an 8 byte sequence, or b) return
an error?

Objects that implement packet oriented protocols or for which it makes
sense to return 0-sized reads should implement read methods that are
independent of the Reader trait.

This is a breaking change because there might be Reader implementations
that return 0, however, no such implementations are in the standard
library.

[breaking-change]
@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 25, 2014

I forgot about the case where the buffer is empty. I think it's best to leave this case up to the implementation.

/// # Error
/// # Return value
///
/// If the length of `buf` is 0, the return value is unspecified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be changed to say that implementations may either return Ok(0) or Err(...) at their choosing? It seems like that shouldn't effectively limit implementations and it'll be good to forbid totally silly values like Ok(1123465234).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have cases in mind where returning Ok(1) would be dangerous?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods that pass a buffer to read and then slice it down to the part that was filled will panic, for example.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 26, 2014

Even after this, I think splitting Reader up into a basic trait and a specialized trait for blocking readers is inevitable. The documentation of Reader already makes a passing remark that

read [...] will block until data is available

but this should not be required in general and is not true in today's rust programs, even in the stdlib. E.g., there is a long issue talking about stdout being non-blocking in some cases and how this case should be handled (libnative currently doesn't handle this case.)

The read function itself and some extension methods are still useful in the non-blocking case and restricting a trait as common as Reader to blocking implementations will inevitably result in broken implementations.

The following methods can theoretically work fine with non-blocking read:

  • read
  • read_u8
  • read_i8
  • push

The rest will not work. Consider a reader implemented on top of a large but finite non-blocking byte stream.

read_u16 and friends might read one byte and then an error. At this point you have to decide if you loop indefinitely or discard the first byte. The second alternative will most likely trash the whole stream.

read_to_end and bytes should be obvious.

In general, everything that might call read more than once doesn't work.

How should this be handled? Some ideas:

  • Add a method is_blocking to Reader. All methods that call read multiple times will first check this function and return an error immediately.
  • Add a method read_blocking which does ^
  • Split Reader into a general and a blocking trait.

A basic implementation of the third version including optimized versions of read_u16 and friends can be found here: https://gist.github.com/anonymous/a580fa6c6fd0564c7240

@alexcrichton
Copy link
Member

The worry of non-blocking reads has been brought up in the past, and it definitely seems similar to the UDP where it's packet-oriented instead of byte-oriented. I'm somewhat wary to bolt on this "return of 0 is undefined" behavior clause to Reader in an attempt to fix it without considering it as a more holistic design. As it looks like you're describing, the Reader trait is mostly incompatible with nonblocking reads in terms of discarding data.

For Reader specifically (as written), I do not think we should say that a return value of 0 is undefined. This seems very similar to the "iterator protocol" question of what to do after an iterator returns None once, and I've definitely had 0-length reads come up in practice occasionally.

All in all, I do understand that the story of Reader + nonblocking IO isn't so hot, but I'd want to move carefully in this space rather than just tack on assumptions here and there to band-aid it up. The split you've made of Reader and BlockingReader seems like a great start!

@emberian
Copy link
Member

@mahkoh @alexcrichton any update on this? It's been sitting for almost a month now.

@alexcrichton
Copy link
Member

@cmr thanks for the reminder. I think that I'm going to close this out of inactivity, and I think that this will definitely come up when we get around to stabilizing std::io. There are thoughts to cut back on the scope of Reader to purely just a read method, for example, which should address at least some of the concerns here. For now though it may be best to clear up @bors's queue.

lnicola pushed a commit to lnicola/rust that referenced this pull request Oct 17, 2024
…nicola

Fix panic when json project has relative buildfile paths

The `build_file` path may be relative to the workspace root.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disallow Readers that return Ok(0) from read
5 participants